In this notebook, we look at the oxidation present in the raw data set and show that taking them into account or not, changes the merged data set.
In [1]:
from msdas import *
%pylab inline
In [2]:
r = MassSpecReader(get_yeast_raw_data())
In [3]:
r.df.shape
Out[3]:
In [4]:
r.plot_phospho_stats()
In [5]:
# which row contains Oxidation in its sequence ?
df = r.df[r.df.Sequence_Phospho.apply(lambda x: "Oxidation" in x)]
# we can build an new MassSpecReader instance from this dataframe:
oxidation = MassSpecReader(df)
In [6]:
oxidation.df.shape
Out[6]:
In [7]:
oxidation.plot_phospho_stats()
# it looks like it is representative of the big data set(see figures above)
In [8]:
# similarly for the numner of NAs
clf()
r.get_na_count().hist(normed=True, alpha=0.5)
oxidation.get_na_count().hist(normed=True, alpha=0.5, color="green")
# Here we see that number of NAs
Out[8]:
Let us figure out if some proteins in the small data set have peptides with oxidation:
In [9]:
y = MassSpecReader(get_yeast_small_data())
In [10]:
proteins = list(set(y.df.Protein))
In [11]:
filter_proteins = oxidation.df.Protein.apply(lambda x: x in proteins)
subdf = oxidation.df[filter_proteins].Protein
found = list(set(subdf.values))
In [12]:
found
Out[12]:
Effect of the Oxidation in the merging from raw data to small data set.
We will look at STE12 case found in the list above.
In [13]:
Y = replicates.ReplicatesYeast(get_yeast_raw_data(), verbose=True, cleanup=True)
Y.normalise()
In [14]:
clf();
res1 = y.plot_timeseries("STE12_S400")
res2 = Y.plot_timeseries("STE12_S400", color="g", markersize=5)
Here, we have in red the data from the small data set
In green, the two row data that correspond to peptide STE12_400. There are 2: one with oxidation tag and one without. But this is the same peptides (see next cell).
One green data set correspond exactly to the samll data set, so it shows that peptide with oxidation are removed, as confirmed by looking at the data set.
In [15]:
Y['STE12_S400']
Out[15]:
In [ ]: